Goto

Collaborating Authors

 multi-omic data


Transforming Multi-Omics Integration with GANs: Applications in Alzheimer's and Cancer

Reza, Md Selim, Afroz, Sabrin, Rahman, Mostafizer, Alam, Md Ashad

arXiv.org Machine Learning

Multi-omics data integration is crucial for understanding complex diseases, yet limited sample sizes, noise, and heterogeneity often reduce predictive power. To address these challenges, we introduce Omics-GAN, a Generative Adversarial Network (GAN)-based framework designed to generate high-quality synthetic multi-omics profiles while preserving biological relationships. We evaluated Omics-GAN on three omics types (mRNA, miRNA, and DNA methylation) using the ROSMAP cohort for Alzheimer's disease (AD) and TCGA datasets for colon and liver cancer. A support vector machine (SVM) classifier with repeated 5-fold cross-validation demonstrated that synthetic datasets consistently improved prediction accuracy compared to original omics profiles. The AUC of SVM for mRNA improved from 0.72 to 0.74 in AD, and from 0.68 to 0.72 in liver cancer. Synthetic miRNA enhanced classification in colon cancer from 0.59 to 0.69, while synthetic methylation data improved performance in liver cancer from 0.64 to 0.71. Boxplot analyses confirmed that synthetic data preserved statistical distributions while reducing noise and outliers. Feature selection identified significant genes overlapping with original datasets and revealed additional candidates validated by GO and KEGG enrichment analyses. Finally, molecular docking highlighted potential drug repurposing candidates, including Nilotinib for AD, Atovaquone for liver cancer, and Tecovirimat for colon cancer. Omics-GAN enhances disease prediction, preserves biological fidelity, and accelerates biomarker and drug discovery, offering a scalable strategy for precision medicine applications.


Robust Multi-Omics Integration from Incomplete Modalities Significantly Improves Prediction of Alzheimer's Disease

Park, Sungjoon, Lee, Kyungwook, Yim, Soorin, Hwang, Doyeong, Kim, Dongyun, Lee, Soonyoung, Dunn, Amy, Gatti, Daniel, Chesler, Elissa, O'Connell, Kristen, Kim, Kiyoung

arXiv.org Artificial Intelligence

Multi-omics data capture complex biomolecular interactions and provide insights into metabolism and disease. However, missing modalities hinder integrative analysis across heterogeneous omics. To address this, we present MOIRA (Multi-Omics Integration with Robustness to Absent modalities), an early integration method enabling robust learning from incomplete omics data via representation alignment and adaptive aggregation. MOIRA leverages all samples, including those with missing modalities, by projecting each omics dataset onto a shared embedding space where a learnable weighting mechanism fuses them. Evaluated on the Religious Order Study and Memory and Aging Project (ROSMAP) dataset for Alzheimer's Disease (AD), MOIRA outperformed existing approaches, and further ablation studies confirmed modality-wise contributions. Feature importance analysis revealed AD-related biomarkers consistent with prior literature, highlighting the biological relevance of our approach.


An Interpretable Ensemble Framework for Multi-Omics Dementia Biomarker Discovery Under HDLSS Conditions

Lee, Byeonghee, Kang, Joonsung

arXiv.org Artificial Intelligence

The advent of multi-omics technologies has revolutionized biomedical research, enabling simultaneous interrogation of genomic, transcriptomic, proteomic, and metabolomic layers [Wang et al., 2021a]. This integrative paradigm has yielded unprecedented insights into the molecular architecture of complex diseases, particularly neurodegenerative disorders such as Alzheimer's disease. However, multi-omics datasets are often characterized by high-dimensional variables and limited sample sizes--a configuration known as high-dimension low-sample size (HDLSS). Under such constraints, conventional statistical methods suffer from reduced power and unrealistic assumptions [Fan and Lv, 2008], while deep learning models may exhibit overfitting and lack interpretability [LeCun et al., 2015]. Recent advances in dementia biomarker discovery have embraced multi-omics integration. For example, Iturria-Medina [2018] fused neuroimaging and omics data to identify disease-relevant signatures. Zhang [2020] employed transcriptomic-proteomic fusion to uncover molecular markers, and Lee [2022] demonstrated the discriminative utility of metabolomic features in Alzheimer's pathology. These efforts build upon foundational work in integrative omics [Hasin, 2017, Karczewski and Snyder, 2018], yet challenges persist in elucidating latent gene networks and selecting statistically robust features amidst inter-feature dependencies.


scI2CL: Effectively Integrating Single-cell Multi-omics by Intra- and Inter-omics Contrastive Learning

Liu, Wuchao, Peng, Han, Li, Wengen, Zhang, Yichao, Guan, Jihong, Zhou, Shuigeng

arXiv.org Artificial Intelligence

Single-cell multi-omics data contain huge information of cellular states, and analyzing these data can reveal valuable insights into cellular heterogeneity, diseases, and biological processes. However, as cell differentiation \& development is a continuous and dynamic process, it remains challenging to computationally model and infer cell interaction patterns based on single-cell multi-omics data. This paper presents scI2CL, a new single-cell multi-omics fusion framework based on intra- and inter-omics contrastive learning, to learn comprehensive and discriminative cellular representations from complementary multi-omics data for various downstream tasks. Extensive experiments of four downstream tasks validate the effectiveness of scI2CL and its superiority over existing peers. Concretely, in cell clustering, scI2CL surpasses eight state-of-the-art methods on four widely-used real-world datasets. In cell subtyping, scI2CL effectively distinguishes three latent monocyte cell subpopulations, which are not discovered by existing methods. Simultaneously, scI2CL is the only method that correctly constructs the cell developmental trajectory from hematopoietic stem and progenitor cells to Memory B cells. In addition, scI2CL resolves the misclassification of cell types between two subpopulations of CD4+ T cells, while existing methods fail to precisely distinguish the mixed cells. In summary, scI2CL can accurately characterize cross-omics relationships among cells, thus effectively fuses multi-omics data and learns discriminative cellular representations to support various downstream analysis tasks.



MOTGNN: Interpretable Graph Neural Networks for Multi-Omics Disease Classification

Yang, Tiantian, Chen, Zhiqian

arXiv.org Machine Learning

Integrating multi-omics data, such as DNA methylation, mRNA expression, and microRNA (miRNA) expression, offers a comprehensive view of the biological mechanisms underlying disease. However, the high dimensionality and complex interactions among omics layers present major challenges for predictive modeling. We propose Multi-Omics integration with Tree-generated Graph Neural Network (MOTGNN), a novel and interpretable framework for binary disease classification. MOTGNN employs eXtreme Gradient Boosting (XGBoost) to perform omics-specific supervised graph construction, followed by modality-specific Graph Neural Networks (GNNs) for hierarchical representation learning, and a deep feedforward network for cross-omics integration. On three real-world disease datasets, MOTGNN outperforms state-of-the-art baselines by 5-10% in accuracy, ROC-AUC, and F1-score, and remains robust to severe class imbalance (e.g., 87.2% vs. 33.4% F1 on imbalanced data). The model maintains computational efficiency through sparse graphs (2.1-2.8 edges per node) and provides built-in interpretability, revealing both top-ranked biomarkers and the relative contributions of each omics modality. These results highlight MOTGNN's potential to improve both predictive accuracy and interpretability in multi-omics disease modeling.


Multi-Omics Analysis for Cancer Subtype Inference via Unrolling Graph Smoothness Priors

Lu, Jielong, Wu, Zhihao, Yu, Jiajun, Bu, Jiajun, Wang, Haishuai

arXiv.org Artificial Intelligence

Integrating multi-omics datasets through data-driven analysis offers a comprehensive understanding of the complex biological processes underlying various diseases, particularly cancer. Graph Neural Networks (GNNs) have recently demonstrated remarkable ability to exploit relational structures in biological data, enabling advances in multi-omics integration for cancer subtype classification. Existing approaches often neglect the intricate coupling between heterogeneous omics, limiting their capacity to resolve subtle cancer subtype heterogeneity critical for precision oncology. To address these limitations, we propose a framework named Graph Transformer for Multi-omics Cancer Subtype Classification (GTMancer). This framework builds upon the GNN optimization problem and extends its application to complex multi-omics data. Specifically, our method leverages contrastive learning to embed multi-omics data into a unified semantic space. We unroll the multiplex graph optimization problem in that unified space and introduce dual sets of attention coefficients to capture structural graph priors both within and among multi-omics data. This approach enables global omics information to guide the refining of the representations of individual omics. Empirical experiments on seven real-world cancer datasets demonstrate that GTMancer outperforms existing state-of-the-art algorithms.


Graph Neural Networks in Multi-Omics Cancer Research: A Structured Survey

Zohari, Payam, Chehreghani, Mostafa Haghir

arXiv.org Artificial Intelligence

The task of data integration for multi-omics data has emerged as a powerful strategy to unravel the complex biological underpinnings of cancer. Recent advancements in graph neural networks (GNNs) offer an effective framework to model heterogeneous and structured omics data, enabling precise representation of molecular interactions and regulatory networks. This systematic review explores several recent studies that leverage GNN-based architectures in multi-omics cancer research. We classify the approaches based on their targeted omics layers, graph neural network structures, and biological tasks such as subtype classification, prognosis prediction, and biomarker discovery. The analysis reveals a growing trend toward hybrid and interpretable models, alongside increasing adoption of attention mechanisms and contrastive learning. Furthermore, we highlight the use of patient-specific graphs and knowledge-driven priors as emerging directions. This survey serves as a comprehensive resource for researchers aiming to design effective GNN-based pipelines for integrative cancer analysis, offering insights into current practices, limitations, and potential future directions.


MoXGATE: Modality-aware cross-attention for multi-omic gastrointestinal cancer sub-type classification

Dip, Sajib Acharjee, Shuvo, Uddip Acharjee, Mallick, Dipanwita, Abir, Abrar Rahman, Zhang, Liqing

arXiv.org Artificial Intelligence

Cancer subtype classification is crucial for personalized treatment and prognostic assessment. However, effectively integrating multi-omic data remains challenging due to the heterogeneous nature of genomic, epigenomic, and transcriptomic features. In this work, we propose Modality-Aware Cross-Attention MoXGATE, a novel deep-learning framework that leverages cross-attention and learnable modality weights to enhance feature fusion across multiple omics sources. Our approach effectively captures inter-modality dependencies, ensuring robust and interpretable integration. Through experiments on Gastrointestinal Adenocarcinoma (GIAC) and Breast Cancer (BRCA) datasets from TCGA, we demonstrate that MoXGATE outperforms existing methods, achieving 95\% classification accuracy. Ablation studies validate the effectiveness of cross-attention over simple concatenation and highlight the importance of different omics modalities. Moreover, our model generalizes well to unseen cancer types e.g., breast cancer, underscoring its adaptability. Key contributions include (1) a cross-attention-based multi-omic integration framework, (2) modality-weighted fusion for enhanced interpretability, (3) application of focal loss to mitigate data imbalance, and (4) validation across multiple cancer subtypes. Our results indicate that MoXGATE is a promising approach for multi-omic cancer subtype classification, offering improved performance and biological generalizability.


OmicsCL: Unsupervised Contrastive Learning for Cancer Subtype Discovery and Survival Stratification

Karagoz, Atahan

arXiv.org Artificial Intelligence

Unsupervised learning of disease subtypes from multi-omics data presents a significant opportunity for advancing personalized medicine. We introduce OmicsCL, a modular contrastive learning framework that jointly embeds heterogeneous omics modalities-such as gene expression, DNA methylation, and miRNA expression-into a unified latent space. Our method incorporates a survival-aware contrastive loss that encourages the model to learn representations aligned with survival-related patterns, without relying on labeled outcomes. Evaluated on the TCGA BRCA dataset, OmicsCL uncovers clinically meaningful clusters and achieves strong unsupervised concordance with patient survival. The framework demonstrates robustness across hyperparameter configurations and can be tuned to prioritize either subtype coherence or survival stratification. Ablation studies confirm that integrating survival-aware loss significantly enhances the predictive power of learned embeddings. These results highlight the promise of contrastive objectives for biological insight discovery in high-dimensional, heterogeneous omics data.